Introduction

This is a simple project aiming to show R skills. Through this project I will share basic skills of loading and formatting data. Multiple tables are joined and grouped in order to present complete summary and explicit statistics. At the end, data are presented in graphics

Used database is provided with R package for flights departed in 2013 from New York City.

## Table flights
## Rows: 336,776
## Columns: 19
## $ year           <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2…
## $ month          <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ day            <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ dep_time       <int> 517, 533, 542, 544, 554, 554, 555, 557, 557, 558, 558, …
## $ sched_dep_time <int> 515, 529, 540, 545, 600, 558, 600, 600, 600, 600, 600, …
## $ dep_delay      <dbl> 2, 4, 2, -1, -6, -4, -5, -3, -3, -2, -2, -2, -2, -2, -1…
## $ arr_time       <int> 830, 850, 923, 1004, 812, 740, 913, 709, 838, 753, 849,…
## $ sched_arr_time <int> 819, 830, 850, 1022, 837, 728, 854, 723, 846, 745, 851,…
## $ arr_delay      <dbl> 11, 20, 33, -18, -25, 12, 19, -14, -8, 8, -2, -3, 7, -1…
## $ carrier        <chr> "UA", "UA", "AA", "B6", "DL", "UA", "B6", "EV", "B6", "…
## $ flight         <int> 1545, 1714, 1141, 725, 461, 1696, 507, 5708, 79, 301, 4…
## $ tailnum        <chr> "N14228", "N24211", "N619AA", "N804JB", "N668DN", "N394…
## $ origin         <chr> "EWR", "LGA", "JFK", "JFK", "LGA", "EWR", "EWR", "LGA",…
## $ dest           <chr> "IAH", "IAH", "MIA", "BQN", "ATL", "ORD", "FLL", "IAD",…
## $ air_time       <dbl> 227, 227, 160, 183, 116, 150, 158, 53, 140, 138, 149, 1…
## $ distance       <dbl> 1400, 1416, 1089, 1576, 762, 719, 1065, 229, 944, 733, …
## $ hour           <dbl> 5, 5, 5, 5, 6, 5, 6, 6, 6, 6, 6, 6, 6, 6, 6, 5, 6, 6, 6…
## $ minute         <dbl> 15, 29, 40, 45, 0, 58, 0, 0, 0, 0, 0, 0, 0, 0, 0, 59, 0…
## $ time_hour      <dttm> 2013-01-01 05:00:00, 2013-01-01 05:00:00, 2013-01-01 0…
## Table airlines
## Rows: 16
## Columns: 2
## $ carrier <chr> "9E", "AA", "AS", "B6", "DL", "EV", "F9", "FL", "HA", "MQ", "O…
## $ name    <chr> "Endeavor Air Inc.", "American Airlines Inc.", "Alaska Airline…
## Table airports
## Rows: 1,458
## Columns: 8
## $ faa   <chr> "04G", "06A", "06C", "06N", "09J", "0A9", "0G6", "0G7", "0P2", "…
## $ name  <chr> "Lansdowne Airport", "Moton Field Municipal Airport", "Schaumbur…
## $ lat   <dbl> 41.13047, 32.46057, 41.98934, 41.43191, 31.07447, 36.37122, 41.4…
## $ lon   <dbl> -80.61958, -85.68003, -88.10124, -74.39156, -81.42778, -82.17342…
## $ alt   <dbl> 1044, 264, 801, 523, 11, 1593, 730, 492, 1000, 108, 409, 875, 10…
## $ tz    <dbl> -5, -6, -6, -5, -5, -5, -5, -5, -5, -8, -5, -6, -5, -5, -5, -5, …
## $ dst   <chr> "A", "A", "A", "A", "A", "A", "A", "A", "U", "A", "A", "U", "A",…
## $ tzone <chr> "America/New_York", "America/Chicago", "America/Chicago", "Ameri…
## Table planes
## Rows: 3,322
## Columns: 9
## $ tailnum      <chr> "N10156", "N102UW", "N103US", "N104UW", "N10575", "N105UW…
## $ year         <int> 2004, 1998, 1999, 1999, 2002, 1999, 1999, 1999, 1999, 199…
## $ type         <chr> "Fixed wing multi engine", "Fixed wing multi engine", "Fi…
## $ manufacturer <chr> "EMBRAER", "AIRBUS INDUSTRIE", "AIRBUS INDUSTRIE", "AIRBU…
## $ model        <chr> "EMB-145XR", "A320-214", "A320-214", "A320-214", "EMB-145…
## $ engines      <int> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
## $ seats        <int> 55, 182, 182, 182, 55, 182, 182, 182, 182, 182, 55, 55, 5…
## $ speed        <int> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
## $ engine       <chr> "Turbo-fan", "Turbo-fan", "Turbo-fan", "Turbo-fan", "Turb…
## Table weather
## Rows: 26,115
## Columns: 15
## $ origin     <chr> "EWR", "EWR", "EWR", "EWR", "EWR", "EWR", "EWR", "EWR", "EW…
## $ year       <int> 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013, 2013,…
## $ month      <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ day        <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ hour       <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 15, 16, 17, 18, …
## $ temp       <dbl> 39.02, 39.02, 39.02, 39.92, 39.02, 37.94, 39.02, 39.92, 39.…
## $ dewp       <dbl> 26.06, 26.96, 28.04, 28.04, 28.04, 28.04, 28.04, 28.04, 28.…
## $ humid      <dbl> 59.37, 61.63, 64.43, 62.21, 64.43, 67.21, 64.43, 62.21, 62.…
## $ wind_dir   <dbl> 270, 250, 240, 250, 260, 240, 240, 250, 260, 260, 260, 330,…
## $ wind_speed <dbl> 10.35702, 8.05546, 11.50780, 12.65858, 12.65858, 11.50780, …
## $ wind_gust  <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 20.…
## $ precip     <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ pressure   <dbl> 1012.0, 1012.3, 1012.5, 1012.2, 1011.9, 1012.4, 1012.2, 101…
## $ visib      <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
## $ time_hour  <dttm> 2013-01-01 01:00:00, 2013-01-01 02:00:00, 2013-01-01 03:00…

There were 336776 flights. The longest one was of 4983 miles but the shortest one was only 17 miles. The longest time in air was of 695 minutes but the shortest one was only 20 minutes long. Regarding maximum delay it was of 1301 minutes. On the other hand, the earliest departure was 43 minutes before scheduled time.

Flights were served by 16 carriers.

There were 1458 destinations.

Through the New York City airport there passed 3322 different planes. The oldest one was manufactured in 1956 and the newest one was from 2013. The largest plane was designed to carry 450 passengers but the smallest one was only for 2. The average cruising speed was of 236.8 mph.

The strongest wind during take off or landing was 1048 mph but the average one was only 10 mph. The strongest precipitations were of 1,21 inch. Mean visibility was of 9,2 miles but the worst one was of 0,0 miles.

Data correction

This section present data changes in order to facilitate further analysis.

## [1] 73941

Total quantity of NA values is 73941.

Data collections

This section present initial grouped statistics. Date in flights table was saved in a new column with date type. According to date, corresponding quarter was assigned to each flight. General statistics were calculated for each quarter.

Quarter summary of flights
quart mean_dep_delay mean_arr_delay mean_air_time mean_distance number_flights
1 11.415210 5.857851 151.4962 1006.869 80789
2 15.878225 10.310172 149.6872 1045.553 85369
3 13.794974 6.400052 146.1716 1054.259 86326
4 9.363125 4.956591 155.5180 1051.178 84292

The lowest number of flights was made in the first quarter. In the same time, the mean flight distance was the shortest. On the other hand, the highest value of delays was noted in the second quarter.

Table join

Short statistics of flights were collected for each carrier. They were arranged according to the mean arrival delay, from the highest to the lowest. As we can see, the highest mean delay were above 20 minutes for two airlines: Frontier Airlines Inc. and AirTran Airways Corporation.

The worst airline according to mean arrival delay
name mean_dep_delay mean_arr_delay mean_air_time mean_distance number_flights
Frontier Airlines Inc. 20.215543 21.9207048 229.59912 1620.0000 685
AirTran Airways Corporation 18.726075 20.1159055 101.14394 664.8294 3260
ExpressJet Airlines Inc. 19.955390 15.7964311 90.07619 562.9917 54173
Mesa Airlines Inc. 18.996330 15.5569853 65.74081 375.0333 601
SkyWest Airlines Inc. 12.586207 11.9310345 83.48276 500.8125 32
Envoy Air 10.552041 10.7747334 91.18025 569.5327 26397
Southwest Airlines Co. 17.711744 9.6491199 147.82481 996.2691 12275
JetBlue Airways 13.022522 9.4579733 151.17717 1068.6215 54635
Endeavor Air Inc. 16.725769 7.3796692 86.78160 530.2358 18460
United Air Lines Inc. 12.106073 3.5580111 211.79135 1529.1149 58665
US Airways Inc. 3.782418 2.1295951 88.57380 553.4563 20536
Virgin America 12.869421 1.7644644 337.00235 2499.4822 5162
Delta Air Lines Inc. 9.264504 1.6443409 173.68880 1236.9012 48110
American Airlines Inc. 8.586016 0.3642909 188.82230 1340.2360 32729
Hawaiian Airlines Inc. 4.900585 -6.9152047 623.08772 4983.0000 342
Alaska Airlines Inc. 5.804775 -9.9308886 325.61777 2402.0000 714

Having that in mind, we select the flights of those two airlines and we compare wind conditions for the worst 15 flights.

15 records of the highest arrival delay for two selected airlines with respect to wind parameters
arr_delay carrier origin dest time_hour wind_speed precip visib
834 F9 LGA DEN 2013-02-10 08:00:00 8.05546 0.00 10.0
572 FL LGA ATL 2013-09-12 14:00:00 13.80936 0.00 9.0
551 FL LGA ATL 2013-07-10 14:00:00 12.65858 0.00 8.0
505 FL LGA ATL 2013-09-02 14:00:00 4.60312 0.01 6.0
461 FL LGA ATL 2013-05-23 17:00:00 10.35702 0.00 7.0
436 FL LGA ATL 2013-03-08 08:00:00 19.56326 0.04 0.5
433 FL LGA CAK 2013-12-18 09:00:00 9.20624 0.00 10.0
410 F9 LGA DEN 2013-03-08 17:00:00 14.96014 0.00 10.0
400 FL LGA CAK 2013-08-12 14:00:00 6.90468 0.00 10.0
395 FL LGA CAK 2013-07-10 14:00:00 12.65858 0.00 8.0
390 FL LGA ATL 2013-07-10 17:00:00 14.96014 0.00 10.0
382 FL LGA ATL 2013-07-07 18:00:00 12.65858 0.00 10.0
374 FL LGA ATL 2013-07-23 14:00:00 10.35702 0.00 10.0
345 FL LGA CAK 2013-11-10 09:00:00 12.65858 0.00 10.0
345 F9 LGA DEN 2013-05-23 13:00:00 16.11092 0.00 10.0

This summary shows that for the highest arrival delay in majority of cases the weather was not a problem. The only exception is the flight on 8th march 2023 at 8 AM, when visibility was below 1 mile and wind speed was around 20 mph.

Graphs

This section presents main graphics.